Object Identification

The Object Identification module enables you to train and publish models that can auto-populate ontology fields by identifying document class. Object identification leads to document source identification, and this enables the application to lookup for source specific information in the database. If the information is available, it is used to auto-populate the ontology fields as per the requirements. Auto-populating saves time as it is quicker than the regular OCR-and-rule-based extraction and more reliable.

Note

To use business keyword modeling for a given subtype of documents, the subtype must have a field of Source Name type.

The accuracy and reliability of the models in finding correct data depends upon how well they have been trained.

Object identification models internally use Machine Language (ML) concepts to identify objects present in the documents.

 

To open Object Identification page, go to the main menu, and then under ML Studio, click Object Identification.

Following activities are involved in publishing a logo classification model:

  1. Uploading a batch

  2. Setting annotation

  3. Publishing the model

Uploading a batch

To train and publish a model, the first step is to upload a batch of documents that can be sampled and annotated. To upload a batch , follow these steps: 

  1. On the Object Identification page, click to expand the Upload Batch panel.

Column header descriptions

COLUMN NAME

DESCRIPTION

Batch Name

Displays the name of a given batch.

Last Updated Date

Displays the date on which the batch was last updated.

Version

Displays the version number of the batch.

Pending for annotation

Displays the name of documents pending in a batch for annotation.

Status

Displays the current status of the batch.

To delete any batch, click corresponding to the batch name.

  1. On the upper-right corner of the panel, click Add. The Add Batch window opens.

    Fill-in the details based on the following field descriptions.

    FIELD

    DESCRIPTION

    Batch Name*

    The name of the batch that is to be created.

    Upload Document*

    Click on the Select File link to upload the batch files.

    Selected Files

    Displays the total number of selected files for a batch.

    Cancel

    To exit without saving any unsaved changes.

    Save

    To exit after saving the changes.

    * Mandatory fields

Note

  1. The more number of files, the better the model will perform. However, there is a trade-off between the number of available files and the time required to process them.

  2. Permitted file types are PDF, JPG, JPEG, PNG, TIFF, ZIP, BMP, DOC, XPS, and TXT.

 

Setting annotation and training the model

Annotating a batch refers to labeling the objects identified in a given batch (of documents). After annotating the required batch, you can train and publish the required model.

To annotate a batch, perform the following steps: 

  1. In the Upload Batch panel, select the batch that you want to annotate, and then in the bottom-right corner of the section, click Annotate. The Upload Batch panel collapses and the Set Annotation panel expands. Also, a message is displayed on the page that the documents (in the selected batch) are being processed.

    Note

    • In case a batch is not required to be included in the model, you can delete it by clicking on the respective icon.

  2. After all the documents in the batch get processed, the logo images appear under each other in the panel.

You can use the features available on the taskbar to delete a document, or to navigate the document, or to change the size of the displayed document, or to rotate the document.

Hover-over the above image to know about the available features.

  1. Select and label the objects one-by-one. To label a object, click the required object and select the class from the Class drop-down list. You can also create a new class if required by clicking on Add New Class available in the drop-down list and then click Save.

    Note:

    To create a model, a minimum of 20 annotations is required in a given batch.

  1. To save the annotations for a given document, click . A success message will briefly appear on the page after annotations are saved.

  2. After you have saved the keyword annotations for the required documents in the batch, select Review to review the list of annotated keywords. The Review window will open, in which the annotations and their counts will be displayed.

  1. To include or exclude specific images or labels in the model, click on the inverted arrow next to the IMAGE or LABEL header. A list of images or labels will be displayed. Select or clear the images or labels that you want to include or exclude from the model, and then click OK.

  2. Click Start Training. A message will appear on the screen confirming that the model is being trained. After it is trained, the model will appear under the Model training workbench page ready to get published.

Model metrics

The Model Metrics panel displays the object identification models available in the application. The models appear stacked in the order they get created (unless some filter or sorting is applied). Each model appears with an incremented version number as compared to its predecessor.

Column header descriptions

COLUMN NAME

DESCRIPTION

Batch Name

Displays the name of a given batch.

Model Version

Displays the version of a given batch.

Train Set File Count

Displays the number of image profiles that have been used to train a given model.

Test Set File Count

Displays the number of image profiles that were used to test the model.

Date of Publish

Displays the date when a given model was published.

F1 Score/Accuracy

Displays the accuracy results (in %) based on the test done on the available logos. (See Test Set File Count column header.)

Class Name

Displays the class name to which the model is associated.

Status

Displays the status of the model.

In the panel, some of the models may yet not be published. To publish them, click Publish. All models available in the panel that are yet to be published will get published.

Publishing a model

Publishing a model means making it available to the application for use in processing the documents. Published models are applied for processing one-by-one starting from the earliest version (1.00) available to the latest until one of them provides a result.

 

 

Note:

Removing a ML model is a permanent action, and once removed, the model cannot be recovered. If needed in the future, you'll have to either re-publish or re-train the model according to your requirements.

To remove a published model, follow these steps:

  1. Access the Model Information page. This page provides a comprehensive overview of all the trained ML models available.

  1. On the Model Information page, under the Custom tab, identify and find the Object Identification ML model you wish to remove. Utilize the Search field to filter and find specific records within the displayed grid (Refer searching data).

  2. Click under Action column corresponding to the Object Identification ML model to remove the ML model. A Attention window opens.

  3. Click Yes to permanently remove the ML model from the list. You can view the removed model on the Model Metrics page.

 

Re-publishing a model is a crucial step in ensuring that the latest improvements or modifications are reflected in its performance.

Note:

Before initiating the re-publishing process, carefully review the existing details of the model and make any necessary updates to align with your desired changes.

To re-publish a model, follow these steps:

  1. On the Model Metrics page, identify the ML model that you want to re-publish. Models that are not currently published will likely be displayed as UnPublished.

  2. Select the unpublished model available in the panel, which you want to re-publish. A Publish button is visible. Utilize the Search field to filter and find specific records within the displayed grid (Refer searching data).

  3. Click Publish to re-publish the model. Once published, the status of the model changes from UnPublished to Published indicating that the re-publishing process was successful.

    You can view the published model on the Model Information page. Take a moment to review the details associated with the re-published model.